470 research outputs found

    Genome-wide inference of ancestral recombination graphs

    Get PDF
    The complex correlation structure of a collection of orthologous DNA sequences is uniquely captured by the "ancestral recombination graph" (ARG), a complete record of coalescence and recombination events in the history of the sample. However, existing methods for ARG inference are computationally intensive, highly approximate, or limited to small numbers of sequences, and, as a consequence, explicit ARG inference is rarely used in applied population genomics. Here, we introduce a new algorithm for ARG inference that is efficient enough to apply to dozens of complete mammalian genomes. The key idea of our approach is to sample an ARG of n chromosomes conditional on an ARG of n-1 chromosomes, an operation we call "threading." Using techniques based on hidden Markov models, we can perform this threading operation exactly, up to the assumptions of the sequentially Markov coalescent and a discretization of time. An extension allows for threading of subtrees instead of individual sequences. Repeated application of these threading operations results in highly efficient Markov chain Monte Carlo samplers for ARGs. We have implemented these methods in a computer program called ARGweaver. Experiments with simulated data indicate that ARGweaver converges rapidly to the true posterior distribution and is effective in recovering various features of the ARG for dozens of sequences generated under realistic parameters for human populations. In applications of ARGweaver to 54 human genome sequences from Complete Genomics, we find clear signatures of natural selection, including regions of unusually ancient ancestry associated with balancing selection and reductions in allele age in sites under directional selection. Preliminary results also indicate that our methods can be used to gain insight into complex features of human population structure, even with a noninformative prior distribution.Comment: 88 pages, 7 main figures, 22 supplementary figures. This version contains a substantially expanded genomic data analysi

    Evolution at the Subgene Level: Domain Rearrangements in the Drosophila Phylogeny

    Get PDF
    Supplementary sections 1–13, tables S1–S10, and figures S1–S9 are available at Molecular Biology and Evolution online (http://www.mbe.oxfordjournals.org/).Although the possibility of gene evolution by domain rearrangements has long been appreciated, current methods for reconstructing and systematically analyzing gene family evolution are limited to events such as duplication, loss, and sometimes, horizontal transfer. However, within the Drosophila clade, we find domain rearrangements occur in 35.9% of gene families, and thus, any comprehensive study of gene evolution in these species will need to account for such events. Here, we present a new computational model and algorithm for reconstructing gene evolution at the domain level. We develop a method for detecting homologous domains between genes and present a phylogenetic algorithm for reconstructing maximum parsimony evolutionary histories that include domain generation, duplication, loss, merge (fusion), and split (fission) events. Using this method, we find that genes involved in fusion and fission are enriched in signaling and development, suggesting that domain rearrangements and reuse may be crucial in these processes. We also find that fusion is more abundant than fission, and that fusion and fission events occur predominantly alongside duplication, with 92.5% and 34.3% of fusion and fission events retaining ancestral architectures in the duplicated copies. We provide a catalog of ∼9,000 genes that undergo domain rearrangement across nine sequenced species, along with possible mechanisms for their formation. These results dramatically expand on evolution at the subgene level and offer several insights into how new genes and functions arise between species.National Science Foundation (U.S.) (Graduate Research Fellowship)National Science Foundation (U.S.) (CAREER award NSF 0644282

    Methods and analysis of genome-scale gene family evolution across multiple species

    Get PDF
    Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2010.Cataloged from PDF version of thesis.Includes bibliographical references (p. 123-136).The fields of genomics and evolution have continually benefited from one another in their common goal of understanding the biological world. This partnership has been accelerated by ever increasing sequencing and high-throughput technologies. Although the future of genomic and evolutionary studies is bright, new models and methods will be needed to address the growing and changing challenges of large-scale datasets. In this work, I explore how evolution generates the diversity of life we see in modern species, specifically the evolution of new genes and functions. By reconstructing the history of the diverse sequences present in modern species, we can improve our understanding of their function and evolutionary importance. Performing such an analysis requires a principled and efficient means of computing the most probable evolutionary scenarios. To address these challenges, I introduce a new model of gene family evolution as well as a new method SPIMAP, an efficient Bayesian method for reconstructing gene trees in the presence of a known species tree. We observe many improvements in reconstruction accuracy, achieved by modeling multiple aspects of evolution, including gene duplication and loss rates, speciation times, and correlated substitution rate variation across both species and loci. I have implemented and applied this method on two clades of fully-sequenced species, 12 Drosophila and 16 fungal genomes as well as simulated phylogenies, and find dramatic improvements in reconstruction accuracy as compared to the most popular existing methods, including those that take the species tree into account. Lastly, I use the SPIMAP method to reconstruct the evolutionary history of all gene families in 16 fungal species including several relatives of the pathogenic species C. albicans. From these reconstructions, we identify several families enriched with duplications and positive selection in pathogenic lineages. Theses reconstructions shed light on the evolution of these species as well as a better understanding of the genes involved in pathogenicity.by Matthew D. Rasmussen.Ph.D

    Probabilistic framework for genome-wide phylogeny and ortholog determination

    Get PDF
    Thesis (S.M.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2006.Includes bibliographical references (leaves 63-65).Comparative genomics of multiple related species has emerged as a powerful tool for genome signal discovery. To that end, dozens of mammalian, fly, and fungal genomes have been fully sequenced. Making use of these genomes requires rigorous computational methods for determining the evolutionary history of every gene and region. In particular, comparative analysis requires the ability to distinguish between orthologous and paralogous regions. Current approaches to ortholog identification work adequately for pairs of species but are ineffective for multiple complete genomes. This thesis presents a new phylogenetic reconstruction method, SINDIR, that is designed specifically for genome-wide orthology determination. Unlike any other method, SINDIR exploits the known evolutionary history of a set of species to infer the history of their genes. This is done by learning a probabilistic model of evolution from a trusted set of unambiguous orthologs. Given this model, SINDIR can find the maximum likelihood phylogenetic tree for any set of the genes. In a novel technique, synteny maps are used to train and evaluate the evolutionary model on both simulated and real sequence data. SINDIR avoids errors commonly committed by current methods and achieves a significantly improved accuracy of orthology determination.by Matthew D. Rasmussen.S.M

    Uncovering the components of the Francisella tularensis virulence stealth strategy

    Get PDF
    Over the last decade, studies on the virulence of the highly pathogenic intracellular bacterial pathogen Francisella tularensis have increased dramatically. The organism produces an inert LPS, a capsule, escapes the phagosome to grow in the cytosol (FPI genes mediate phagosomal escape) of a variety of host cell types that include epithelial, endothelial, dendritic, macrophage, and neutrophil. This review focuses on the work that has identified and characterized individual virulence factors of this organism and we hope to highlight how these factors collectively function to produce the pathogenic strategy of this pathogen. In addition, several recent studies have been published characterizing F. tularensis mutants that induce host immune responses not observed in wild type F. tularensis strains that can induce protection against challenge with virulent F. tularensis. As more detailed studies with attenuated strains are performed, it will be possible to see how host models develop acquired immunity to Francisella. Collectively, detailed insights into the mechanisms of virulence of this pathogen are emerging that will allow the design of anti-infective strategies

    Real-Time Projection to Verify Plan Success During Execution

    Get PDF
    The Mission Data System provides a framework for modeling complex systems in terms of system behaviors and goals that express intent. Complex activity plans can be represented as goal networks that express the coordination of goals on different state variables of the system. Real-time projection extends the ability of this system to verify plan achievability (all goals can be satisfied over the entire plan) into the execution domain so that the system is able to continuously re-verify a plan as it is executed, and as the states of the system change in response to goals and the environment. Previous versions were able to detect and respond to goal violations when they actually occur during execution. This new capability enables the prediction of future goal failures; specifically, goals that were previously found to be achievable but are no longer achievable due to unanticipated faults or environmental conditions. Early detection of such situations enables operators or an autonomous fault response capability to deal with the problem at a point that maximizes the available options. For example, this system has been applied to the problem of managing battery energy on a lunar rover as it is used to explore the Moon. Astronauts drive the rover to waypoints and conduct science observations according to a plan that is scheduled and verified to be achievable with the energy resources available. As the astronauts execute this plan, the system uses this new capability to continuously re-verify the plan as energy is consumed to ensure that the battery will never be depleted below safe levels across the entire plan

    Behavioral metabolution: the adaptive and evolutionary potential of metabolism-based chemotaxis

    Get PDF
    We use a minimal model of metabolism-based chemotaxis to show how a coupling between metabolism and behavior can affect evolutionary dynamics in a process we refer to as behavioral metabolution. This mutual influence can function as an in-the-moment, intrinsic evaluation of the adaptive value of a novel situation, such as an encounter with a compound that activates new metabolic pathways. Our model demonstrates how changes to metabolic pathways can lead to improvement of behavioral strategies, and conversely, how behavior can contribute to the exploration and fixation of new metabolic pathways. These examples indicate the potentially important role that the interplay between behavior and metabolism could have played in shaping adaptive evolution in early life and protolife. We argue that the processes illustrated by these models can be interpreted as an unorthodox instantiation of the principles of evolution by random variation and selective retention. We then discuss how the interaction between metabolism and behavior can facilitate evolution through (i) increasing exposure to environmental variation, (ii) making more likely the fixation of some beneficial metabolic pathways, (iii) providing a mechanism for in-the-moment adaptation to changes in the environment and to changes in the organization of the organism itself, and (iv) generating conditions that are conducive to speciatio

    w

    Full text link
    corecore